N Semantic Classes are Harder than Two

نویسندگان

  • Ben Carterette
  • Rosie Jones
  • Wiley Greiner
  • Cory Barr
چکیده

We show that we can automatically classify semantically related phrases into 10 classes. Classification robustness is improved by training with multiple sources of evidence, including within-document cooccurrence, HTML markup, syntactic relationships in sentences, substitutability in query logs, and string similarity. Our work provides a benchmark for automatic n-way classification into WordNet’s semantic classes, both on a TREC news corpus and on a corpus of substitutable search query phrases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The two be's of English

This  qualitative  study  investigates  the  uses  of  be  in  Contemporary  English.  Based  on  this  study, one  easy  claim  and  one  more  difficult  claim  are  proposed.  The  easy  claim  is  that  the  traditional distinction between be as a lexical verb and be as an auxiliary is faulty. In particular, 'copular-be', traditionally considered to be a lexical verb, is in fact a prototypi...

متن کامل

On the Role of Derivational Processes in the Formation of Non-Taxonomic Classes of Lexical Units in Russian

The paper is focused on classes of lexical units which arise as a result of derivational processes – word formation and semantic transfers, acting either in isolation or together, on the basis of common semantic foundations that bind targets and sources of derivation. The lexical items which constitute the classes under study vary in their denotative characteristics and due to their categ...

متن کامل

Controlled Bootstrapping of Lexico-semantic Classes as a Bridge between Paradigmatic and Syntagmatic Knowledge: Methodology and Evaluation

Semantic classification of words is a highly context sensitive and somewhat moving target, hard to deal with and even harder to evaluate on an objective basis. In this paper we suggest a step–wise methodology for automatic acquisition of lexico–semantic classes and delve into the non trivial issue of how results should be evaluated against a top–down reference standard.

متن کامل

Estimation of Variance of Normal Distribution using Ranked Set Sampling

Introduction     In some biological, environmental or ecological studies, there are situations in which obtaining exact measurements of sample units are much harder than ranking them in a set of small size without referring to their precise values. In these situations, ranked set sampling (RSS), proposed by McIntyre (1952), can be regarded as an alternative to the usual simple random sampling ...

متن کامل

Why Is Simulation Harder than Bisimulation?

Why is deciding simulation preorder (and simulation equivalence) computationally harder than deciding bisimulation equivalence on almost all known classes of processes? We try to answer this question by describing two general methods that can be used to construct direct one-to-one polynomial-time reductions from bisimulation equivalence to simulation preorder (and simulation equivalence). These...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006